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designed to develop a psYcholinguistics of comprehension and memorY 
for meaningful written prose paragraphs. The approach departs from 
most previous ones by seeking to formulate u 

instead of relying on informal qualitative judgments as to ^ragrap 
«;tructure the^scoring of data, and the processes of comprehension 
and memory. The paper discusses overall methodological principles and 
ITsZllloll designed to yeild as results the "J, 

of paragraphs and presents a means for 

analysis of the paragraph. Experiments intended to aid in perfecting 
the methodology are described along with results which provide a 
objective and complete method for scoring recall protocols. A 
bibliocjraphy is included. (A.uthor/VM) 
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Author's Abstract 



Comprehension and memory for prose is che topic of the reported 
research. A psycholinguistic analysis is undertaken consisting of 
two phases: the development of a linguistic description of paragrap 

structure and the conducting of experiments to ascertain psychological 
correlates of the structure. A third phase, formulation of a process 
model acting c.i the structure to produce the data, is envisioned in 
future extensions of the theory. 

To date, the major focus has been on constructing the linguistic 
descriptions. In this respect the approach is unlike raost others pro- 
posed by psvchologists. An explicit model of structure is claimea to 
be a methodological prerequisite for a psychological investigation, 
not only for creating process models but also for scoring data and 
formulating predictive indices. For the present purpose, the aim is 
to develop the model at a level of generality sufficient for an ex 
plicit characterization of individual experimental passages, out ad 
mittedly falling short of the generality traditionally sought in 
linguistic semantic theory. The approach has proceeded inductively 
from detailed analyses of individual paragraphs , appealing to^an 
attempting to explicate the theorist's semantic intuitions. j^esu s 
to date are promising, out the evolved principles must be further ex- 
plicated and generalized. Applications to experiments confirm this 
advance in n.ethodology for studying prose comprehension and memory. 



2 



o 

ERIC 



Final Report 



Project No. DBS-0224 
Grant No. OEG-S-9-150400-4006 057 



PARAGRA.PH STRUCTURE AND PARAGRAPH COMPREHENSION 



Edward J. Crothers 
Dept, of Psychology 
Univer . of Colorado 
Boulder, Colo. 30302 



August 1971 



The research reported herein was performed pursuant to a grant with 
the Office of Education, U. S. Department of Health, Educatxon, and 
V?elfare. Contractors undertaking such projects under Government^ 
sponsorship are encouraged to express freely taeir pro essxona ^ 

ment in the conduct of the project. Points of view or opxnions stated 
“rLt. therefore, necessarily represent official Office of Education 

position or policy. 



U. S. DEPARTMENT OF HEALTH, EDUCATION, AND WELFARE 
Off ice of Education 

Committee on Basic Research in Education 
(National Research Council) 



3 




Preface 



This report is only a summary of the research, becaur-e 
been reported in detail in two technical reports prevxous y 
to the U. S. Office of Education: 



it has 
submitted 



Crothers 5 E. 

of Colorado 
1970. 1-93 



The psycholinguistic structure of knowledge. 
Department of Psychology. Technical Report. 



Univer . 
Nov. 5 



Crothers, E. J. Memory structure and the recall of discourse. Tech- 
nical Report CLIPR-4, April, 1971. 1-74. 



An -^arlier draft of the first paper was presented at COBRE Research 
Workshop on Cognitive Organization and Psychological ’ 

1970. A later version of it will appear xn the proceedxngs of that 
workshop, to be published by the National Academy of Scxences . T .e 
second paper was presented at COBRE Research Workshop 

Comprehension and the Acquisition of Knowledge, Aprxl, 1971. It wxll 
appear in a volume on the proceedings of the workshop. 
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The aim of this research is to develop a psychol,inp,uisti cs of 
the comprehension and memory for meaninj;fui v.'ritten prose paragraphs. 

The approach denar ts from most previous ones by seeking to formulate 
an explicit theory instead of relyins on informal qualitative jud-monts 
as to the paragraph structure, the scoring of data, and the processes 
of comprehension and memory. Vdiere explicit models have previously 
been proposed bv others, a closer analysis has revealed that such 
models arc not in fact designed to explain comprehension and memory 
for prose. In particular, the computer s iimal at ion of semantic memory 
(Siimnons L Slocum, 1970; Ouillian, 1963 , 1969) involves mainly the 
retrieval of highly overlearned facts from long-term memory (LTM) . 

Little is said about how new information gets compreucnded and as- 
similated into the LTM schema. Several formal psychological 
f'uistic-rhetorical approaches do exist, but they yield superficia 
descriptions of the structure, beta of the sti.mulus paragraph and o 
the response protocol paragraph. hither all of the content save lor 
its abstract logical properties is discarded (Dawes, 1966; 

1969; Frederiksen, 1971), or else the passage is reduced to higtily 
abstract outline headings such as execution of means (-oriol o' 
Hollenbach, 1970) v;hich at best are a very incomplete description. 
Another fundamental defect in most of these approaches, and in others 
as well (Harris, 1963; Katz & Fodor, 1963) is that the essence of 
paragraph organization, namely that it is built around a theme (topic, 
gist, abstract) is not represented by the theory. Finally, many o 
the approaches represent only the underlying semantic content but 
fail to represent the "surface" properties of the actual text itself, 
such as its particular pattern of syntactic reduction, implied pro 
positions, etc. v;hich do not change the content, but do selectively ^ 
affect its salience. The objection to such an incomplete representation 
is that comprehension and memory clearly will depend not only oii t e. 
semantic content, but also on the "emphasis", by v;hatever term it 
might be called (e.g.. "form", "style", "foregrounding ). 

Thus the first stage of the current program is essentially 
methodological (or linguistic); to formulate a theory of the under- 
lying and surface structures of prose. This is certainly a formidable 
problem, since one confronts many of the mys.-eries of meaning w ic 
have confounded semantic theorists and philosophers for centuries. 

Once progress has occurred on this, at least to a modest degree o 
generality sufficient to support construction and scoring of experi- 
mental paragraphs, the program enters the second stage. Here per- 
tinent experiments are conducted for two purposes: to discover em- 

pirical correlates of the structure and thereby to draw inf erences 
pursuant to the third stage, a process model complementing tne struc- 
ture model. As will be summarized here, my efforts and progress nave 
been quite promising on the first stage, satisfactory on the concept- 
ually simpler second stage, and virtually nil on the third stage. Tne 
focus of this report will be on methods and results for the first two 
stages (structure and experiments, respectively), and will be discus- 
sed in that order. 
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Method: psycho],inj:iiistic structure 



Under this rubric fall the 
assumptions adopted to yield as 
of parajjraphs. They are: 



overall methodolof’ ical principles and 
results the specific representations 



1 . 

dividual 



The structure model is evolved inductively, by analysing in 
paragraphs in detail and then formulating overall conclusions. 



2. A single passage of one or a few paragraphs is a proper unit 
for analysis. 



3. The present application is to primaril.y descriptive prose, 
extensions to narrative and exhortative prose seera feasible, but non 
thematic prose is outside the scope of the analyses. 



A. 

distinct 

capable 



The structure model for a particular paragraph is conceptua 
from the process model, '.:aich specifies general operations 
of acting on many particular structures. 



lly 



5. The structure model represents only the content, v/hetner 
stated or implied, of the passage itself, plus the surface propertxes 
(vjhich do not change the content). Definitions of v.’ords in the para- 
graph are relegated to an LTM component not formulated in the present 

theory . : 



6. The structure model must identify the theme (gist , ^abbtiact) 
of a passage. In addition, it must represent the nonthematic content 
as well. 



7. Pursuant to conditions 5 and 6, the semantic analysis must^ 
freely resort to semantic intuition, especially to explicate implicit 
superordinates and other implications. Generally speaking, recourse to 
superordinates is allowed only when it exhibits the relationship am.ong 
coordinates in the paragraph or ones in data. 



3. The structure model must be the foundation for defining 
measures, especially indices of difficulty, accuracy of recall, and 
"centrality" or "theme-ness" of individual statements in the passage. 

By and large, the rationale for these principles is that there is 
simply no other viable way to begin. In particular, the appeal to 
intuition is not unlike the method in current semantic theorie. (e.g. 
Chafe, 1970). Objectivity is sought by successive approximations. 

Once consensual agreement is won, further analyses are done to^ex- 
piicate the grounds for the consensus. Without exploiting one s un- 
formalized semantic knowledge, all that is possible is a superficial 
analysis. 
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Results : 



]>syclio li nc’.u isL i c structure 



To chite, v/hat has been analyzed (underlying;; structure^ r.ostly) is 
one passa^;o of from one to four parajii'canns on each of the foilov^inp; 
topics: nebuJ ae , oceaiioj^rapUy , steel production^ and sclcniunu An- 

aiyscs are partially co::in.letecl for paragraphs on red bloc, cells, fresco 
painting, seed testing, the hari tribe, and the governiP.cnt of the fic- 
titious Circle Island. The later analyses prcceed more quickly now 
that tlie method is becoming' more stanucir ized . However, it ceitainly 
cannot be claimed that a set of principles sufficient even for de- 
script ive prose has been formulated yet. 

The analysis yields a tree graph representation of a paragraph’s 
underlying structure, save for several notable features. One is thett 
the subtrees for ’'parenthetic" subtopics are not dominated in tae 
graph by the main root node v/i)icb corresponds to the theme. Ihe other 
is tfnat the tree graph is augriiented by statements v/hich enumei. ate tne 
coref erent iall ty mappings betv:een differei;t subtrees. For example, ii. 
objects v;ere classified jointly by size and shape to yield t\vo subtiees, 
the raapping statements v:ould stipulate which sizes v;ent with wiixch shapes. 
These two departures from a convcnticnal tree graph are o.f course de- 
partures from an outline equivalent to tlie graph. ..another impoi.tant 
difference, of course, is that in the graph the linear ordering of the 
subtrees is arbitrary, whereas in an outline it generally conforms ^ 

or less to the sequencing of sentences i.n the text. This paiticular dj-i 
ference betw-een a conventional outline and the present graph dioappears 
later, v/h^- that graph is replaced by the superficial (foregrounded) one, 
but the other tv;o differences remain. In other respects, the representa- 
tion resembles a tree graph of a very detailed outline, one wnich does 
include all the semantic content and not just the abstract headings. Each 
node corresponds to a sentence, either simple, compound, or complex. All 
text sentences, even the implied ones, are made explicit an the graph. 
Often, however, sentences are traitsformed (by semantic paraphrasing as^ 
veil as syntactic transforming) in order to normalize them in L-he graph. 
Criteria for graphically subordinating one sentence to another, though 
still subject to revi.sion, are about as follows. If Sentence B differs 
from Sentence A only by the presence of restrictive modifiers in B (e.g. 
syntactic modifiers or ones of lexical implication) then B is subordinate 
to A. Thus B implies A, and the implication ensues by deleting tne modj. 
fier. Is^ien an implication requires more than one premise, they are treated 
as coordinate to one another and subordinate to the implication. 



Here as in a typical outline, the most problematic aspect is the 
postulating of superordinates not stated explicitly. As originally con- 
ceived, the rule for so doing was that a new superordinate is admissible 
only if it serves to explicate intuitively sensed relationships among 
stated sentences (and recursively, among any sentences already intro- 
duced). Superordinates 'wiiich express abstractions without uniting tw’o 
or more subordinates could be added aim ost without limit, ana seemingly 
without motivation. On later analysis, however, it appears that such 
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Tcdunclont i:iupcro]’di c/.niiioL be: civoidod entirely > bc:CiiiiL>c eoiuct *i i^ics 

they do appear in data. It iiu'j;ht be: noted briefly that such an analysis 
soruetiiucs reveals contradic t Lons or aiiibipai^ ^ inherent In Lb e pnssay.e. 

For example, a frequent ambiguity is the failure to state or imply any 
correspondence betv;een different subtrees. 

liajor problems v;hich are still only partly resolved iniclude tlic 
treatment of praeraatic inferences, t’ne treatraent of parallelism, im- 
proving the notation (especially for quantifiers, negations, and logical 
connectives), and developing a general semant ic-los;ical taxonomy of the 
bases for subordination and coedination (e.g., quantification, lexical 
implication, etc.j 

Given the underlying graph, tlie final step is to derive v;hat might 
be called the ’’superficial structure" or perhaps the "foregrounded 
structure’’. Foregrounding refers to selective emphasis, and is un- 
doubtedly a potent determiner of comprehension and memory. Hence it is 
vital to a theory. Unfortunately, this issue is little understood, and 
has only recently come under serious scrutiny oy linguists. /.pparently 
the foregrounded structure should also be a graph, tliougii tnis pointy 
was not recognised in Vr.y earlier papers citea previously. Conceptually, 
.this structure lies intcimtediate between the actual paingxaph and the 
underlying structure. An important assumption here is that this 
structure does not obviate the underlying one. Rather, any rigorous 
derivation of the foregrounded structure seems to require not only the 
text but also th.c pref oregrounding hierarchies. Foregrounding operations 
are viewed as analogous to syntactic transformations of raising and _.ov7 
ering, in that both alter the graph structure without changing the 
semantic content. In both cases, the effect is to create a foreground, 
topic, or focus of emphasis. The crux of the problem is to identify 
precisely what the structural cues to foregounding are. Evidently 
the '”e are a number of cues which probably often covary w'ith one another, 
such as the sentence sequence, syntactic reduction or zeroing, and 
frequency of recurrence. A crucial future problem is the investigation 
of such areas, especially at a level of generality which is moderate 
but sufficient to support psychological research. Then it v/ill become 
possible to proceed to a serious study of processes, envisaged as creat- 
ing the foregrounded mental representation (comprehension) and later 
degrading it (memory) . 
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xpcriiaont has boon coniploLod, including the- data analyvSos. 
i have boon run but the analyses are not yet finished. All 
■ mainly to aid in perfecting the methodology, and not to 
,y major psychological conclusions. The foremost methodo- 
"v/as to exploit the theory in order to score the data in a 
t, objective manner than has hitherto been possible. 

jor objective of the completed experiment v/as to determine 
ot it is indeed true that people tend to remember best the 
t of a passage. Kach college student subject read four 
1 counterbalanced order, one on each of tlie topics nebulae, 
a, oceanogr^lphy , and steel production. An ancillary 
. variable v/as the superfical organization of the nebulae^ 

’.trier the properties of nebulae foregounded over the kinds 
- else the reverse, foregrounding (this depended chiefly on 
, secuence.) Also, each subject was tested at the end of the 
. . on the first of his four paragraphs. Then he received 
,t seven days later on all paragraphs. On a test, ne was 
I j write in his own v7ords everything he could remember from 
■\. A pretest v/as given at the start of the first session, 

.1 wledge of each topic prior to reading about it. 

thodological innovation was in how the theme was determined. 

I jw seems unsatisfactory, but at least it and the subsequently 
. irovements have the virture of being explicitly definaole 
I iph. At the time of scoring the data, the solution adopted 
I.J. V the gist v/ith the hig'ner, more abstract nodes in the 
abstracts are a matter of degree, not either-or.) Unfoa- 
graph invoked was the underlying (preforegrounding) one, 
it was then available and recognized as necessary. 

three studies in progress, two resemble the above in that 
oted to rather traditional hypotheses, again using normal 
of deliberately distorted pa.ssages. One of the two was 
-eplication of well-known studies on the mnemonic value of 
izers. There were tw^o groups of subjects, one who read 
Tns and the other who also read an abstract prior to each 
otal reading time was equated in the tv;o groups. the 
.’cre on nebulae, selenium, seed testing, and red olood cells, 
.•evious experiment, subjects were now tested on all para- 
; the first session; no delayed test was given. The test- 
was a random order following the random training sequence, 
tlv, it m.ight be mentioned Chat an initial attempt to in- 
i group who read a full outline rather than the text ^ 

;essfui. That initial try at outlining yielded headings 
ladvertently abstract and confusing to the subjects. 



. I ; f 



A 4 .*tn, innovation consisted of invoking the theory to generate 

ll*i« I' ^ now presenting the theme explicitly' instead of only 
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scoring; it: in the protocol:;. A^;ain, the procedure v;as based on the 
uiiderlying, ore;'»rouiKi Ln^ f;raph< hov.'ovcr , the former ”top-dov;n‘* 

procedure fur ^;unerat ing the abfitract v;as abandoned, for rcaisons to 
be discussed in the sumiiiary of the results of the first study, I his 

time, a quasi-la formation procedure v/tis substituted, as follov/s. Each 
node v/as assigned as number, namely the product over its iimnediate 
descendants of their own numbers of descendants. Then the top 10 % 
of the nodes by this measure v/ere selected as coraprising the theme. 

The corresponding sentences were used to compose the abstracts. 

The aim of the next experiment was to compare recall after four 
days as a function of the integrative response executed immediately 
after reading. There were three groups of subjects. each began by 
reading a paragraph, which V7as then v/ithdrawn from viev;. Then, de- 
pending on the group 5 the subject either attempted to write his recall 
of the paragraph, or attempted to recall and organize (as an outline) 
the passage, or read an outline of the passage. All group. ^ vjere allov-ed 
equal time for tlie integriitive task. One question was v/hether or not 
the additional organizing activities of outlining, beyond those in- 
duced siiupl.y by recalling and writing the recall, v/ould facilitate a 
second recall. Another question v;as the pragmatic one of vfnether or 
not the advantage of active (sub j ect-produced ) outlines over passive 
ones (experimenter-produced) would offset the presumed greater semantic 
acceptability of the latter- A further main goal of the experiment, in 
particular the active outlining condition, was to furnish exploratory 
data on comprehension, rather than just memory data as in the other 
comparisons. How closely will a sub j ect * s outline rerlect the fore- 
grounded gi'aoh? One x*70uld anticipate an overall concurrence, but the 
explanation of any. discrepancies is an open question. In fact, in some 
cases a detailed analysis might suggest attributing the disparity to an 
error in the foregrounded graph, not to a lapse in comprehension. 

The other experiment in progress attacked a theoretical issue, 
namely whether or not memory depends on the location of the item in the 
graph structure. Unlike either the first experiement or research by 
others on recall of hierarchically organized words, the point of the 
design was to control for the lexical content itself. That is, the 
aim was to assign the words randomly to the nodes, then construct the* 
rest of the sentence frame so as to avoid semantic anomalies. £>y using 
a new^ random order with each subject, one could thereby separate 
idiosyncratic lexical effects from effects due to the position in the 
abstract graph. Sentences v;ere contrived so that frequency of overt 
presentation was constant over (most) nodes- Much trial and error v/as 
necessary in order to construct artificial paragraphs suitable for the 
experiraent- To reduce the artificiality, each v/as then preceded and 
follov/ed by more natural-sounding sentences on the same topic. Also, 
successive experimental paragraphs were separated by a buffer paragraph. 
The order of events was: buffer paragraph, key paragraph, test buffer 

paragraph, test key paragraph, then recycling v;ith another randomly 
(without replacement) selected pair until four of each had been admin- 
istered. 
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Results : 



exper iiRcnlis 



The main conclusion v;as that:, by and largG, tlic structure model 
indeed provides a relatively objective and complete method for scoring 
recall protocols. The first experiment, which is tne only one wnose 
data have been analyzed to date, yielded instructive but somev/hat un- 
expected findings. The hypothesis that the theme would be recalled 
better than the nonthcmatic content v;as rejected. higher-level nodes 
were not overtly recalled better than lower-level ones, even in those 
cases vjhere it seemed indisputable that the former ^.^ere no more abstract 
lexically than the latter. lior were elements of the principal subtree 
remembered more frequently than elements of the "parenthetic"’ subtrees. 
V/nat did correlate str iking], y v;ith recall was a node s frequency of 
occurrence v/ithin the passage. This outcome can be interpreted as 
another line of evidence indicating that the basis for pred^-C cions should 
be the foregrounded graph, not the underlying one. A separate lesult, 
and a rather puzzling one, V7as that momorres for different subtopics 
(subtrees) v;ere statistically independent of ecich other. In interpiet- 
ing these and other findings from this experiment, it should be noted 
that total recall was rather poor, averaging only about 20 a of the 
elements identified by the model (not counting elements knovm on the 
pretest) . 

As to the ancillary variables, neither produced statistically 
significant results. however, their interaction was significant (p<.01); 
in particular, the combination of "properties organization with im- 
mediate plus delayed testing yielded somewhat higher recall than did the 
other three conditions- 
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Conclusions 



A semantic model of prose, even of 5iingJ.c paragraphs, is a 
formidable task but an essential one if a theory or prose comprehension 
and memory is ever to be developed, Ihe present approach is unique in 
the degree to v;hich it emphasises such a model, and shows promise of 
achieving a model of at least lliiuted generality* /'.pparently , the best 
v/ay to proceed is inductively, beginning v;xth detailed analyses of 
individual passages. The approach is a methodological advance, and 
offers a framc\-;ork v/ithiu which psychological issuers can now be in-- 
vestigated more explicitly. Especially, the semantic structure of 
prose and the scoring of prosodic data can nov; be accomplished more 
adequately. However, a fundamental shortcoming of the approach is the 
lack of a formal theory of foregrounding. This must be remedxed xn order 
to pursue psychological applications. ruturc theoretical v7ork will con 
centrate on foregrounding, on generalizing the preforegrounding prin 
ciples, and on possible psychological processes, lor the time being, 
the main purpose of experiments vjill be to illuminate those methodo- ^ 
logical issues. 
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